Using medicare claims to estimate risk-adjusted performance of Pennsylvania trauma centers

Trauma centers use registry data to benchmark performance using a standardized risk adjustment model. Our objective was to utilize national claims to develop a risk adjustment model applicable across all hospitals, regardless of designation or registry participation. Patients from 2013–14 Pennsylvania Trauma Outcomes Study (PTOS) registry data were probabilistically matched to Medicare claims using demographic and injury characteristics. Pairwise comparisons established facility linkages and matching was then repeated within facilities to link records. Registry models were estimated using GLM and compared with five claims-based LASSO models: demographics, clinical characteristics, diagnosis codes, procedures codes, and combined demographics/clinical characteristics. Area under the curve and correlation with registry model probability of death were calculated for each linked and out-of-sample cohort. From 29 facilities, a cohort comprising 16,418 patients were linked between datasets. Patients were similarly distributed: median age 82 (PTOS IQR: 74–87 vs. Medicare IQR: 75–88); non-white 6.2% (PTOS) vs. 5.8% (Medicare). The registry model AUC was 0.86 (0.84–0.87). Diagnosis and procedure codes models performed poorest. The demographics/clinical characteristics model achieved an AUC = 0.84 (0.83–0.86) and Spearman = 0.62 with registry data. Claims data can be leveraged to create models that accurately measure the performance of hospitals that treat trauma patients.


Introduction
Trauma is unplanned, time-sensitive, and carries a high societal burden of mortality and morbidity. The National Academy of Medicine recently highlighted the need for a national trauma system [1]. Over the past three decades, hospitalizations for injury have risen among those over age 65, with over three million injury-related Emergency Department (ED) visits and 55,000 deaths in 2017 alone [2][3][4]. The system of care developed around trauma-from ambulance services and emergency departments to intensive care units and rehabilitation centersmarkedly reduces injury-related death and disability. Survival is significantly higher for seriously injured patients who are treated in Level I Trauma Centers [5][6][7]. Despite this, approximately two-thirds of patients are cared for in non-trauma centers, as are 70% of injured adults over 55 who receive hospital care [7][8][9]. Understanding the care of the trauma patient population, and identifying opportunities to improve trauma care at non-trauma centers is important for improving outcomes for the entire population of injured adults.
The trauma community has pioneered efforts to measure and benchmark trauma center performance through the Trauma Quality Improvement Program (TQIP). These efforts use a standardized risk adjustment model to benchmark trauma center performance [10]. Many states (43%) have trauma data registries [11], providing an indispensable resource for quality assurance and improvement initiatives. Data are often reported to a central credentialing agency, such as a state Department of Health or the American College of Surgeons, and are pooled to describe the experience of the state or nation in performance improvement initiatives. Facility level outcomes are benchmarked and compared in a de-identified manner across similar centers [10,12].
A key shortcoming of trauma registries is that only 13 states require non-trauma center hospitals to submit their data, limiting the ability to measure care delivered outside of the accredited trauma centers. The National Quality Forum recently released a framework for population based quality measurement for trauma care in the United States [13]. Enacting this will require a standardized and comprehensive approach that leverages compulsory data rather than a voluntary registry limited to specially credentialled hospitals.
The Centers for Medicare & Medicaid (CMS) are the largest purchaser of trauma care in the United States [14]. Claims data for age-eligible beneficiaries (65 years or older) are available uniformly at trauma and non-trauma centers, and contain information including patients' injury mechanism and severity, physiology, comorbidities, and disposition. The objective of our study was to capitalize on these comprehensive data to develop a risk adjustment model that can be applied across trauma and non-trauma centers alike. This approach to measuring trauma outcomes and hospital performance aims to deliver a method of assessing trauma care quality, regardless of hospital designation or registry participation.

Linking
A total of 29,063 cases across 33 trauma centers were identified as eligible for linking in the PTOS data, while 103,621 possible cases across 185 centers were identified from the Medicare claims. Patient-to-Patient linking required a total of 6,105 facility-to-facility comparisons ( Fig 1). The smaller number of cases in the PTOS data dominated the linking attempts creating a matched cohort comprising 16,418 cases (16,058 unique patients) from 29 facilities linked between PTOS and Medicare. These facilities included 11 level I, 13 level II, 2 level III, 1 level IV, and 3 dual accredited pediatric (level I or II) and adult (level I) trauma centers. With regard to demographics, the patients were distributed similarly in both cohorts: median age 82 (IQR: 74-87) in PTOS versus 82 (IQR: 75-88) in Medicare; 6.2% non-white in PTOS versus 5.8% non-white in Medicare. The registry cohort comprised more instances of falls (n = 13,654, 83.2% compared to n = 12,134, 73.9% in claims data). Also, more registry patients had a max abbreviated injury score < 3 (n = 7,633, 46.5% versus n = 7,315, 44.6%). Table 1 shows the demographic and injury characteristics of the linked and full cohorts.

PLOS DIGITAL HEALTH
For manual validation of the facility matching, 18 facilities volunteered to be identified in the PTOS registry. This allowed us to manually validate our automated facility matching and confirm that all 18 of these facilities were correctly paired between PTOS and Medicare datasets. For the remaining 11 facilities, trauma center designation was cross-referenced and matched in both the PTOS and Medicare data. Characteristics between the datasets appear similar with lower rates of transfers and higher prevalence of peripheral vascular disease (PVD) and diabetic comorbidities seen in the Medicare cohort (Table 1).

Registry-based benchmarking
Using the linked PTOS registry data, an AUC of 0.86 (95% CI 0.84-0.87) was estimated for the GLM model and 0.85 (95% CI 0.83-0.86) for the LASSO model. Probability of death values were then generated from each model for each individual case: the two models agreed considerably (Pearson correlation = 0.98; Spearman rank coefficient = 0.93). Given the agreement in both AUC and probability of death correlations, and the need to handle large numbers of clinical attributes in the CMS dataset, we chose the LASSO model for all subsequent analyses. LASSO's dimensionality reduction eliminated 41.5% of the predictor variables (S1 Table) with minimal impact to AUC.

Model performance
The initial Medicare claims model included only beneficiary demographics: age, sex, and race/ ethnicity. Using the linked PTOS-Medicare cases, the model predicted mortality with an AUC of 0.62 [95% CI 0.60-0.64]. The out-of-sample AUC was similar ( Table 2). Comparison of the claims-based model and the registry model produced a Spearman rank correlation of 0.44

Discussion
We found that claims-based models using Medicare data provide a good approximation for the risk adjustment approach that can be achieved through registry data. The variables in the claims-based models include many of the categories found to be important in the original Trauma Quality Improvement Project paper, namely patient demographics, comorbidities and key characteristics of the injury (i.e., mechanism, severity, body region impacted) [10,12].
The AUC for the demographics and clinical risk factors model exceeded 0.80 for both the matched and out-of-sample cohorts, while correlation with the gold-standard registry model exceeded 0.60. Using claims to estimate the probability of death after injury allows for a more generalizable method by which to benchmark trauma center performance. More importantly, it represents a novel approach for benchmarking outcomes at non-trauma center hospitals in order to capture an accurate representation of regional injury outcomes [1]. We also explored the use of the top 500 procedure and diagnostic codes in improving our ability to predict mortality. Inclusion of diagnostic codes improved the AUC to >0.9 for all samples. However, it led to a decreased correlation with probability of death deduced from registry data. We note that diagnosis and procedure codes include information about care delivered after the patient reaches the hospital: it is conceivable that inclusion of this information may overestimate the probability of death in comparison to models that only use information available at the time of the injury or hospital arrival. The goal of the risk adjustment model is not to simply predict death, but to allow for measurement of outcomes after accounting for differences in a patient population. Additionally, the risk adjustment model is not intended to penalize hospitals that treat more ill or more severely injured patients. Consequently, the inclusion of diagnosis and procedure codes may be overcorrecting for the baseline severity, accounting instead for the complications or subsequent care needed throughout the hospitalization.
Analogous to GLM, LASSO models are easy to interpret because features are combined linearly and weights are assigned to each feature (S1 and S2 Tables). LASSO modeling is an innovative approach to risk adjustment and represents an advance over earlier GLM or traditional logistic regression models. Specifically, LASSO's L1 regularization technique allows for weights to be zero, resulting in dimensionality reduction. In turn, this helps discard features when dealing with a large number of claims-based variables [15][16][17]. LASSO's dimensionality reduction specifically helped to adjust for the hundreds of CMS attributes considered in our claims-based models, including demographics, clinical risk factors, and diagnosis and procedure codes. Unchanged, this same method can be applied to risk adjustment efforts that rely on other large data sources. Additionally, it could prove especially important for conditions with no national registry, such as sepsis or cardiac arrest, where generalizability of existing data is a concern.
Our use of administrative data introduced a number of analytical limitations that require discussion. First, by necessity, we relied on probabilistic matching to create a dataset for model building and performance assessment. Despite our use of stringent filtering criteria throughout, it remains formally possible, the PTOS and Medicare linkages may be incorrect for some of the analyzed patients. In light of this possibility, we gauged the generalizability of our models by using both matched and out-of-sample cohorts. Second, our analyses used only those variables that Medicare collects for billing purposes. Information about the timing and location of the injury, or the specifics of the injury itself, are not readily available. Because of this, all injury characteristics were derived. Validated measures were used for these variables, but information may not be complete or accurate given differences in coding practices among hospitals, and the limited number of fields available for reporting. Similarly, this lack of granularity extends to clinical confounders and administrative data may not capture all treatments, interventions, and laboratory. Our models looking at the top diagnoses and procedure codes were an attempt to include as many of the captured codes as possible and evaluate their contribution to claims-based risk adjustment models.
We also note that it may be valuable to perform risk adjustment in younger patients without having to rely on trauma registry data. Similarly, our analysis focused on trauma centers in Pennsylvania. but multi-state or national models may be needed. In both instances, the modeling approach described can be applied to other administrative data sources, as well as Medicare data from trauma centers and non-trauma hospitals across the country. Subsequent validation and calibration work should be completed in the case of non-trauma centers and for models including younger patients, and additional years beyond 2013-2014 may also be warranted. Nonetheless, it will be possible to create general models easily given that the same variables are available across all hospitals that serve Medicare patients, and that similar data and coding is used in numerous other data sources captured for billing purposes.

Data sources and population
We used two data sources. One was the Pennsylvania Trauma Outcomes Study (PTOS) registry data from the Pennsylvania Trauma Systems Foundation, which are the type of data used for conventional trauma center benchmarking. The second source was CMS claims data for hospitals in Pennsylvania, including inpatient stays from the Medicare Provider Analysis and Review (MedPAR) Research Identifiable File (RIF), outpatient visits from the Outpatient claims RIF, and demographic, geographic, date-of-death, and enrollment information from the Master Beneficiary Summary File (Base Segment) and Vital Status File. All injury-related patients from the PTOS and CMS data were included for analysis if they were �65 years and treated in a Pennsylvania hospital in 2013 or 2014. In the CMS data, ICD-9-CM codes were used to identify injury as a primary or secondary diagnosis, (ICD9-CM 800-959 (injuries) excluding 905-909 (late effects of injury), 930-939 (foreign bodies), or 958 (complications of injury), as defined by the National Center for Injury Prevention and Control).

Feature engineering
For each dataset, admission date and date of birth were divided into month, day, and year components to facilitate matching. Naming conventions for Race/Ethnicity data were standardized across data sources. In-hospital death in CMS data was derived from the Vital Status file, and a beneficiary was coded as deceased if they had a valid date of death on or before the date of discharge, regardless of the disposition code. Injury severity scores (ISS), Abbreviated Injury Scores (AIS), and maximum AIS by body region in claims data were calculated using ICD-9 and E-Codes via the icdpicr (version 1.0.0) package [18]. In registry data, age, systolic blood pressure, and pulse rates were categorized into seven percentiles to allow for easier interpretability and more parsimonious models.

Matching patients across data sources
In order to compare risk adjustment between claims data and the gold standard registry data, we created a matched cohort comprised of patients whose records could be identified in the two data sources, Medicare claims and PTOS. All trauma cases in the PTOS registry were eligible for the matching process if they met age criteria (�65 years) and contained values for the following attributes: sex, birth date, injury date, race/ethnicity, residential zip code, and hospital ID. Trauma cases in the CMS data were deemed eligible for the matching process if they met the inclusion and exclusion criteria of the PTOS registry for the year of their claim [19]. To increase the match rate and validate matches between the two datasets, we requested permission from Pennsylvania trauma centers to use their hospital identity.

Linking claims and registry records
Facility to facility matching. To improve the quality of matches, we first performed pairwise comparisons between all facilities in the PTOS and CMS datasets using deidentified hospital facility identifiers. For each facility pair, we calculated two metrics: 1. CMS cohort metric: each individual Medicare trauma case in the candidate facility was given the maximal score when compared against each PTOS case in the candidate facility. One point was earned for matching each attribute (sex, residential zip code, year of birth, month of birth, day of birth, year of admission, month of admission, day of admission, and a combined race/ethnicity value). The mean score across all Medicare cases for the candidate facility was used as the CMS cohort metric.
2. PTOS cohort metric: because the number of cases in the CMS and PTOS cohorts differed, we calculated a similar metric in the opposite direction. Specifically, each individual PTOS case in the candidate facility was given the maximal score when compared against each CMS case in the candidate facility using the same point system noted above. The mean score across all PTOS cases for the candidate facility was used as the PTOS cohort metric.
For each CMS facility, the compared PTOS facilities were ranked in decreasing order of the CMS cohort metric. Similarly, for each PTOS facility, the compared CMS facilities were ranked in decreasing order of the PTOS cohort metric. A facility was deemed a match only if both the CMS and PTOS cohort metrics were top-ranked for the pair. Manual validation of the matched facilities occurred using known facility pairs from the subset of facilities that provided permission.

Matching claims and registry records within facilities
Within each matched facility, a similar process was used to identify cases whose Medicare record could be linked to their PTOS record. Using the same attributes as in the preceding section, pairwise comparisons were performed between CMS and PTOS records within the matched facility, earning one point for each attribute. Records were considered linked only if all of the following conditions were met: (i) there was no more than one mismatched attribute, (ii) for either patient in the pair, there was no other patient it paired with that scored the same or better, and (iii) the patient outcome (died vs survived) status agreed between the CMS and PTOS datasets.

Model development
All models used in-hospital death (survival to discharge) as the outcome. Two registry-based models were estimated using demographics, injury characteristics, comorbidities [10,12], and clinical characteristics to create a parsimonious model: model 1 was estimated using Generalized linear models (GLM); and model 2 was estimated using Least absolute shrinkage and selection operator (LASSO) ( Table 3) [15,20]. For all claims-based models, LASSO was used given the large number of clinical features. Models were generated using matched patients. They were also evaluated against unmatched (out-of-sample) cases to gauge their generalizability against unseen outcomes (Fig  1). To avoid data leakage and overfitting, no cases evaluated in the unmatched cohort were part of the matched training set used during model creation. Ninety-five percent confidence intervals (CI) were computed using 2000 stratified bootstrap replicates. All analyses were conducted using R version 3.6.3 (R Foundation for Statistical Computing, Vienna, Austria). GLMs and the LASSO models were created using the stats and glmnet packages, respectively [15]. For LASSO, 10-fold cross-validation was used to choose the optimal lambda value. For both the GLM and LASSO models, CI for the area under the curve (AUC) were calculated using the pROC version 1.16.2 package [21]. Injury characteristics in claims data were calculated using the icdpicr (version 1.0.0) package [18]. Table 3 lists the variables used across the registry and claims-based models. Registry models included previously identified variables [10,12]. For claims-based models, variables were selected for inclusion if they were an exact match for those in TQIP (age, sex, single worst Abbreviated Injury Score, mechanism of injury, transfer status), or if they were the billing code equivalent of the physiologic factors captured in registry data (impaired sensorium, tachycardia, altered mental status, shock). Additional comorbidities, patient demographics, clinical characteristics, and the 500 most frequent diagnosis and procedure codes were also considered. Variables were then grouped into three categories: demographics, procedure and diagnosis codes, and clinical risk factors.

Correlations using patient-level probability of death
After fitting each model outlined above, the probability of death was estimated for each patient in the matched cohort. For out-of-sample testing, the same models were used to predict death outcomes for the eligible but unmatched cohort. The AUC was calculated for each model, and separately for each out-of-sample evaluation, to estimate the models' ability to predict a death outcome. To avoid data leakage when calculating the out-of-sample AUC, the models were created using only the matched cases with no overlap with the unmatched cohort. For comparison between each of the claims-based models and the registry model, we calculated the Spearman rank correlation coefficient using the patient-level outcome probabilities for the matched patients between each model pair.
Meeting Presentations: A portion of this work was presented at the 2020 American Association for the Surgery of Trauma Annual Meeting and Clinical Congress of Acute Care Surgery and the 2020 European Congress of Trauma and Emergency Surgery (cancelled due to COVID-19).
Supporting information S1